Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iqss/7493 batch archiving api call #7494

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Dec 23, 2020

What this PR does / why we need it: Provides a batch mechanism for archiving via the OAI-ORE/Bag mechanism.

Which issue(s) this PR closes:

Closes #7493

Special notes for your reviewer: PR adds a query to find unarchived versions (which have a null entry in the db for archivalCopyLocation) and then loops through the results to archive versions. The archiving of a single version is the same as if the existing single version API call was used.

Suggestions on how to test this: Configure an archiver - probably the file archiver, and call the API with various combinations of the three query params. The result should be a simple json list of versions (for listonly) or the sequential appearance of zipped bags for the correct versions in the specified location (e.g. up to the limit, only for the most recently published version of a dataset, etc.). To retry, setting the archivalCopyLocation for a given datasetversion in the db and removing any bag file from the final location will restore a version to the unarchived state. Note that the Bag contents aren't changed by this PR so there's no reason to inspect their contents. (If an issue such as a 0 size file is found, the same thing should happen if you call the archive a single version API and it would be a bug in the ORE/Bag generator code that should be a new issue.)

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Probably worth noting in release docs

Additional documentation:

@coveralls
Copy link

coveralls commented Dec 23, 2020

Coverage Status

Coverage decreased (-0.02%) to 18.846% when pulling 7047d00 on TexasDigitalLibrary:IQSS/7493-batch_archiving_API_call into 58b1feb on IQSS:develop.

@pdurbin pdurbin self-assigned this Jan 5, 2021
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks fine but @qqmyers can you please review the comments I made and make sure make html works?

@@ -866,9 +866,9 @@ For example:

``cp <your key file> /usr/local/payara5/glassfish/domains/domain1/files/googlecloudkey.json``

.. _Archiving API Call:
.. _Archiving API Calls:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought this anchor was being used and references to it should be change but apparently it isn't. It sort of makes me wonder if we should delete it if it isn't being used. No strong preference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - it seems random as to whether anchors exist or not. They can be useful beyond just internal links (e.g. you can use them to point people directly to that section via URL in email, etc., so having anchors might be a good default for any significant topics. That said, not a show stopper if it gets deleted.

doc/sphinx-guides/source/installation/config.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/installation/config.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/installation/config.rst Outdated Show resolved Hide resolved
} catch (EJBException e) {
logger.log(Level.WARNING, "EJBException exception: {0}", e.getMessage());
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 9 spaces instead of 8 in a couple places above. Also, the Netbeans formatter prefers try { instead of try{.

@@ -987,7 +987,7 @@ public boolean doesChecksumExistInDatasetVersion(DatasetVersion datasetVersion,



public HashMap getFileMetadataHistory(DataFile df){
public HashMap<?, ?> getFileMetadataHistory(DataFile df){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to getFileMetadataHistory seems to have nothing to do with this pull request and in fact I don't think the method is used at all. (I deleted it as was still able to compile the app.) I would suggest reverting this change and maybe adding a comment/question to the effect of "Is this method used?"

AbstractSubmitToArchiveCommand cmd = ArchiverUtil.createSubmitToArchiveCommand(className, dvRequestService.getDataverseRequest(), dsl.get(0));

if (cmd != null) {
new Thread(new Runnable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not personally very familiar with threads but there do seem to be a couple other places in the code (also added by Jim) that use this new Thread/new Runnable pattern and work fine. I think Jakarta EE offers various ways of handling threads and I know we use @Asynchronous in certain places. I'm not suggesting a change but perhaps threads or async in general could be a topic for a future tech hours so we're all clear on the various options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! FWIW: I don't know details but I think this is basically an EJB vs plain Java difference. Using @Asyncrhonous starts a thread in a managed pool (and I think there's a ManagedExcutorService that would allow you to specify a separate pool that could be managed in the EJB config). Probably would be good to recommend Asynch and think about updating these at some point. That said, since this call just starts one thread to serially chunk through things so I don't think it should cause problems as is (and having others who know more weigh in before making a change might be useful).

@pdurbin pdurbin assigned qqmyers and unassigned pdurbin Jan 5, 2021
@qqmyers qqmyers added the TDL of interest to the Texas Digital Library label Mar 6, 2021
@qqmyers qqmyers closed this Feb 23, 2022
@qqmyers qqmyers deleted the IQSS/7493-batch_archiving_API_call branch February 23, 2022 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TDL of interest to the Texas Digital Library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support batch archiving
3 participants